Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision
نویسندگان
چکیده
Bilingual lexicon extraction is useful, especially for low-resource languages that can leverage from high-resource languages. The Uyghur language a derivative language, and its resources are scarce noisy. Moreover, it difficult to find bilingual resource utilize the linguistic knowledge of other large languages, such as Chinese or English. There little related research on unsupervised Chinese-Uyghur existing methods mainly focus term based translated parallel corpora. Accordingly, effective, This paper proposes method extract dictionary by combining inter-word relationship matrix mapped neural network cross-language word embedding vector. A seed used weak supervision signal. small data map multilingual vectors into unified vector space. As word-particles these two not well-coordinated, stems main particles. strong semantic associate information. Two retrieval indicators, nearest neighbor cross-domain similarity local scaling, calculate dictionaries. experimental results show accuracy proposed in this improved 65.06%. helps improve machine translation, automatic extraction, translations.
منابع مشابه
Bilingual Lexicon Extraction From Internet
This paper introduces an experimental system which can extract translations of words and phrases from the Internet through alignment on parallel WWW pages. The automatic extraction takes place online, is language independent and incrementally formed after a post-editing step by a human being. Actually the experimental system can extract words and phrases between pairs of the languages English, ...
متن کاملCorpus-Driven Bilingual Lexicon Extraction
This paper introduces some key aspects of machine translation in order to situate the role of the bilingual lexicon in transfer-based systems. It then discusses the data-driven approach to extracting bilingual knowledge automatically from bilingual texts, tracing the processes of alignment at different levels of granularity. The paper concludes with some suggestions for future work. 1 Machine T...
متن کاملEvaluating a Pivot-Based Approach for Bilingual Lexicon Extraction
A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word as...
متن کاملLow-resource bilingual lexicon extraction using graph based word embeddings
In this work we focus on the task of automatically extracting bilingual lexicon for the language pair Spanish-Nahuatl. This is a low-resource setting where only a small amount of parallel corpus is available. Most of the downstream methods do not work well under low-resources conditions. This is specially true for the approaches that use vectorial representations like Word2Vec. Our proposal is ...
متن کاملBilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision
Building bilingual lexica from non-parallel data is a longstanding natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e.g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information
سال: 2022
ISSN: ['2078-2489']
DOI: https://doi.org/10.3390/info13040175